1,893 research outputs found

    An assessment of the connection machine

    Get PDF
    The CM-2 is an example of a connection machine. The strengths and problems of this implementation are considered as well as important issues in the architecture and programming environment of connection machines in general. These are contrasted to the same issues in Multiple Instruction/Multiple Data (MIMD) microprocessors and multicomputers

    An improved Newton iteration for the generalized inverse of a matrix, with applications

    Get PDF
    The purpose here is to clarify and illustrate the potential for the use of variants of Newton's method of solving problems of practical interest on highly personal computers. The authors show how to accelerate the method substantially and how to modify it successfully to cope with ill-conditioned matrices. The authors conclude that Newton's method can be of value for some interesting computations, especially in parallel and other computing environments in which matrix products are especially easy to work with

    Highly parallel sparse Cholesky factorization

    Get PDF
    Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms

    Optimal parallel solution of sparse triangular systems

    Get PDF
    A method for the parallel solution of triangular sets of equations is described that is appropriate when there are many right-handed sides. By preprocessing, the method can reduce the number of parallel steps required to solve Lx = b compared to parallel forward or backsolve. Applications are to iterative solvers with triangular preconditioners, to structural analysis, or to power systems applications, where there may be many right-handed sides (not all available a priori). The inverse of L is represented as a product of sparse triangular factors. The problem is to find a factored representation of this inverse of L with the smallest number of factors (or partitions), subject to the requirement that no new nonzero elements be created in the formation of these inverse factors. A method from an earlier reference is shown to solve this problem. This method is improved upon by constructing a permutation of the rows and columns of L that preserves triangularity and allow for the best possible such partition. A number of practical examples and algorithmic details are presented. The parallelism attainable is illustrated by means of elimination trees and clique trees

    Mapping unstructured grid problems to the connection machine

    Get PDF
    We present a highly parallel graph mapping technique that enables one to solve unstructured grid problems on massively parallel computers. Many implicit and explicit methods for solving discretizated partial differential equations require each point in the discretization to exchange data with its neighboring points every time step or iteration. The time spent communicating can limit the high performance promised by massively parallel computing. To eliminate this bottleneck, we map the graph of the irregular problem to the graph representing the interconnection topology of the computer such that the sum of the distances that the messages travel is minimized. We show that, in comparison to a naive assignment of processors, our heuristic mapping algorithm significantly reduces the communication time on the Connection Machine, CM-2

    Efficient ICCG on a shared memory multiprocessor

    Get PDF
    Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially

    Optimal expression evaluation for data parallel architectures

    Get PDF
    A data parallel machine represents an array or other composite data structure by allocating one processor (at least conceptually) per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of the operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum-cost way to evaluate an expression, for several different data parallel architectures. This algorithm applies to any architecture in which the metric describing the cost of moving an array is robust. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes. Remarks are made on several variations of the problem, some of which are solved and some of which remain open

    Plasmonic DNA nanostructures with tailored optical response

    Get PDF

    Kompetenzen und Konvergenzen. Globales Lernen im Rahmen der UN-Dekade \u27Bildung für Nachhaltige Entwicklung\u27

    Full text link
    Der Beitrag stellt die orientierende und integrative Bedeutung des Leitbilds der nachhaltigen Entwicklung dar. Er bezieht sich dabei vor allem auf das Diskussionspapier des Verbands Entwicklungspolitik deutscher Nichtregierungsorganisationen zur UN-Dekade \u27Bildung für nachhaltige Entwicklung\u27 und beschreibt die Position des Globalen Lernens innerhalb einer Bildung für nachhaltige Entwicklung. Unter Bezugnahme auf die Arbeit der KMK-BMZ Arbeitsgruppe zur entwicklungspolitischen Bildung stellt der Autor eigene Eckpunkte für ein Referenzcurriculum des Lernbereichs \u27Eine Welt - entwicklungspolitische Bildung - Globales Lernen\u27 vor. (DIPF/Orig.)This article presents the informational and integrative meaning of the model for sustainable development. It refers especially to the discussion paper of the association of German NGOs on the UN decade \u27education for sustainable development\u27 and describes the position of global learning within an education for sustainable development. With reference to the work of the KMK-BMZ association on developmental-political education the author introduces some pillars for a reference curriculum on the learning issue \u27One world - developmental-political education - global learning\u27. (DIPF/Orig.
    corecore